Computational and Structural Biotechnology Journal — Latest Matching Preprints

1

Strong interactions between highly-dynamic lamina-associated domains and the nuclear envelope stabilize the 3D architecture of Drosophila interphase chromatin

Tolokh, I. S.; Kinney, N. A.; Sharakhov, I. V.; Onufriev, A. V.

2023-01-17 genomics 10.1101/2022.01.28.478236 medRxiv

Top 0.1%

35.1%

Show abstract

BackgroundInteractions among topologically associating domains (TADs), and between the nuclear envelope (NE) and lamina-associated domains (LADs) are expected to shape various aspects of 3D chromatin structure and dynamics; however, relevant genome-wide experiments that may provide statistically significant conclusions remain difficult. ResultsWe have developed a coarse-grained dynamical model of the Drosophila melanogaster nuclei at TAD resolution that explicitly accounts for four distinct epigenetic classes of TADs and LAD-NE interactions. The model is parameterized to reproduce the experimental Hi-C map of the wild type (WT) nuclei; it describes time evolution of the chromatin over the G1 phase of the interphase. Best agreement with the experiment is achieved when the simulations include an ensemble of nuclei, corresponding to the experimentally observed set of several possible mutual arrangements of chromosomal arms. The model is validated against multiple structural features of chromatin from several different experiments not used in model development, including those that describe changes in chromatin induced by lamin depletion. Predicted positioning of all LADs at the NE is highly dynamic - the same LAD can attach, detach and move far away from the NE multiple times during interphase. The probabilities of LADs to be in contact with the NE vary by an order of magnitude, despite all having the same affinity to the NE in the model. These probabilities are mostly determined by a highly variable local linear density of LADs along the genome which also has a strong effect on the predicted radial positioning of individual TADs. Higher probability of a TAD to be near NE is largely determined by a higher linear density of LADs surrounding this TAD. The distribution of LADs along the chromosome chains plays a notable role in maintaining a non-random average global structure of chromatin. Relatively high affinity of LADs to the NE in the WT nuclei substantially reduces sensitivity of the global radial chromatin distribution to variations in the strength of TAD-TAD interactions compared to the lamin depleted nuclei, where a 0.5 kT increase of cross-type TAD-TAD interactions doubles the chromatin density in the central nucleus region. ConclusionsA dynamical model of the entire fruit fly genome makes multiple genome-wide predictions of biological interest. The distribution of LADs along the chromatin chains affects their probabilities to be in contact with the NE and radial positioning of highly mobile TADs, playing a notable role in creating a non-random average global structure of the chromatin. We conjecture that an important role of attractive LAD-NE interactions is to stabilize global chromatin structure against inevitable cell-to-cell variations in TAD-TAD interactions.

2

Genes and Pathways Comprising the Human and Mouse ORFeomes Display Distinct Codon Bias Signatures that Can Regulate Protein Levels

Davis, E. T.; Raman, R.; Byrne, S. R.; Ghanegolmohammadi, F.; MAthur, C.; Begley, U.; Dedon, P.; Begley, T. J.

2025-02-04 genomics 10.1101/2025.02.03.636209 medRxiv

Top 0.1%

33.7%

Show abstract

Arginine, glutamic acid and selenocysteine based codon bias has been shown to regulate the translation of specific mRNAs for proteins that participate in stress responses, cell cycle and transcriptional regulation. Defining codon-bias in gene networks has the potential to identify other pathways under translational control. Here we have used computational methods to analyze the ORFeome of all unique human (19,711) and mouse (22,138) open-reading frames (ORFs) to characterize codon-usage and codon-bias in genes and biological processes. We show that ORFeome-wide clustering of gene-specific codon frequency data can be used to identify ontology-enriched biological processes and gene networks, with developmental and immunological programs well represented for both humans and mice. We developed codon over-use ontology mapping and hierarchical clustering to identify multi-codon bias signatures in human and mouse genes linked to signaling, development, mitochondria and metabolism, among others. The most distinct multi-codon bias signatures were identified in human genes linked to skin development and RNA metabolism, and in mouse genes linked to olfactory transduction and ribosome, highlighting species-specific pathways potentially regulated by translation. Extreme codon bias was identified in genes that included transcription factors and histone variants. We show that re-engineering extreme usage of C- or U-ending codons for aspartic acid, asparagine, histidine and tyrosine in the transcription factors CEBPB and MIER1, respectively, significantly regulates protein levels. Our study highlights that multi-codon bias signatures can be linked to specific biological pathways and that extreme codon bias with regulatory potential exists in transcription factors for immune response and development. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=140 SRC="FIGDIR/small/636209v1_ufig1.gif" ALT="Figure 1"> View larger version (45K): org.highwire.dtl.DTLVardef@de9969org.highwire.dtl.DTLVardef@29e1dforg.highwire.dtl.DTLVardef@1abfebcorg.highwire.dtl.DTLVardef@e119b6_HPS_FORMAT_FIGEXP M_FIG C_FIG

3

Digital Reprogramming Decodes Epigenetic Barriers of Cell Fate Changes

Janeva, A.; Penfold, C. A.; Llorente-Armijo, S.; Li, H.; Zikmund, T.; Stock, M.; Jullien, J.; Straub, T.; Forne, I.; Imhof, A.; Vaquerizas, J. M.; Gurdon, J. B.; Hoermanseder, E. B.

2025-01-24 genomics 10.1101/2025.01.22.634227 medRxiv

Top 0.1%

28.4%

Show abstract

The fates of differentiated cells in our body can be induced to change by nuclear reprogramming. In this way, cells valuable for therapeutic purposes and disease modeling can be produced. However, the efficiency of this process is low, partly due to properties of somatic donor nuclei which stabilize their differentiated fate but also act as barriers reprogramming-associated cell fate changes. The identity of these reprogramming barriers is not fully understood. Here, we developed an artificial intelligence-based approach to model nuclear reprogramming and used it to identify the chromatin modification H3K27ac as an epigenetic barrier to reprogramming-induced cell fate changes. Using reprogramming by nuclear transfer (NT) to eggs of Xenopus laevis as a model system, we profiled chromatin modifications in differentiated cell types alongside gene expression patterns before and after reprogramming. Our model integrated the data and by leveraging model predictions, we find that genes resisting inactivation during reprogramming display chromatin modification barcodes. This revealed H3K27ac as a novel candidate barrier to NT reprogramming. Reducing H3K27ac levels using p300/CBP inhibitors before reprogramming led to an improved downregulation of genes linked to H3K27ac-modified enhancers after reprogramming. Importantly, these effects were accompanied by improved embryonic development of the resulting nuclear transfer embryos. In summary, our study identified H3K27ac as a safeguarding mechanism of cellular identities and as a reprogramming barrier during NT. Hence, the here-developed Digital Reprogramming" approach is capable of modelling and improving current cell-fate reprogramming strategies.

4

Computational Design and Atomistic Validation of a High-Affinity VHH Nanobody Targeting the PI/RuvC Interface of Streptococcus pyogenes Cas9: A Bivalent Hub Strategy for CRISPR-Cas9 Enhancement

Kumar, N.; Dalal, D.; Sharma, V.

2026-03-25 bioinformatics 10.64898/2026.03.22.713495 medRxiv

Top 0.1%

27.2%

Show abstract

The CRISPR-Cas9 system has revolutionized genome engineering, yet its full therapeutic potential remains constrained by challenges in precisely modulating its activity and specificity. Here we report a fully computational end-to-end pipeline for the de novo design of a single-domain VHH nanobody (NbSpCas9-v1) targeting a structurally conserved, non-catalytic epitope at the PAM-interacting (PI) and RuvC-III interface of Streptococcus pyogenes Cas9 (SpCas9; PDB: 4UN3). Nanobody sequences were generated using BoltzGen, a generative diffusion binder design framework, and co-folded with SpCas9 using Boltz-2 to evaluate structural confidence and binding affinity. The top-ranked model (SpCas9_4UN3_Bivalent_Hub_v1) achieved a complex pLDDT of 0.8406, an aggregate score of 0.8016, and an ipTM of >0.8, indicating high confidence in the nanobody-antigen interface. The designed 1,616-residue quaternary complex (SpCas9 + sgRNA + DNA + nanobody) was subjected to 10 ns of all-atom molecular dynamics (MD) simulation using the AMBER14SB force field within the GROMACS/OpenMM framework. The complex stabilized at RMSD [~]6 [A] with a radius of gyration of 39-44 [A], confirming thermodynamic stability under physiological conditions (310 K, 0.15 M NaCl). A conserved 96.3 [A] inter-molecular distance between the nanobody centroid and the HNH catalytic residue H840 establishes NbSpCas9-v1 as a distal, non-inhibitory binder -- ideally suited for a Bivalent Hub architecture recruiting secondary effectors to the Cas9 ribonucleoprotein (RNP). The nanobody-Cas9 interface is stabilized by 8 hydrogen bonds, 4 salt bridges, and [~]1,850 [A]2 of buried solvent-accessible surface area. These results provide a rigorous structural and dynamic foundation for experimental validation of VHH-based CRISPR-Cas9 enhancers and modulators. GRAPHICAL ABSTRACTThe computational workflow proceeds from SpCas9 crystal structure acquisition (PDB: 4UN3) through BoltzGen nanobody design, Boltz-2 structural co-folding, 10 ns explicit-solvent MD validation, and Bivalent Hub functional characterization. The PyMOL rendering below shows the full quaternary complex at atomistic resolution.

5

Visualizing Amino Acid Substitutions in a Physicochemical Vector Space

Nemzer, L. R.

2021-07-16 bioinformatics 10.1101/2021.07.15.452549 medRxiv

Top 0.1%

26.1%

Show abstract

A three-dimensional representation of the twenty proteinogenic amino acids in a physicochemical space is presented. Vectors corresponding to amino acid substitutions are classified based on whether they are accessible via a single-nucleotide mutation. It is shown that the standard genetic code establishes a "choice architecture" that permits nearly independent tuning of the properties related with size and those related with hydrophobicity. This work sheds light on the non-arbitrary benefits of evolvability that may have shaped the development standard genetic code to increase the probability that adaptive point mutations will be generated. Illustrations of the usefulness of visualizing amino acid substitutions in a 3D physicochemical space are shown using recent datasets collected regarding the SARS-CoV-2 receptor binding domain. First, the substitutions most responsible for antibody escape are almost always inaccessible via single nucleotide mutation, and change multiple properties concurrently. Second, it is shown that assays of ACE2 binding by sarbecovirus variants, including the viruses responsible for SARS and COVID-19, are more easily understood when plotted with this method. The results of this research can extend our understanding of certain hereditary disorders caused by point mutations, as well as guide the development of rational protein and vaccine design.

6

Third-nucleotide codon bias and synonymous codon bias define functional translational programs that shape human tissue and cancer proteomes.

Rashad, S.; Niizuma, K.

2025-11-03 genomics 10.1101/2025.10.31.685942 medRxiv

Top 0.1%

26.0%

Show abstract

BackgroundCodon usage bias is a universal feature of the genetic code, yet how synonymous codon bias or third-nucleotide codon bias (A/T-vs G/C-ending) shape translation and proteome composition across tissues and cancer remain unclear. ResultsUsing comparative genomics between human and rodent coding sequences, we uncovered a conserved codon-bias axis. A/T-ending codons consistently marked genes involved in proliferation and RNA processing, whereas G/C-ending codons were enriched for differentiation and neuronal functions. While GC3 scores, measuring the third-nucleotide codon bias, showed differences between humans and rodents due to recombination events, the functional dichotomy was conserved. Isoacceptors frequencies, measuring gene synonymous codon bias, was conserved from rodents to humans. Synonymous codons exhibited distinct functional enrichment patterns, demonstrating functional divergence at the codon level. Two new indices; the ANN-index and mG-index, reflecting codons decoded by the tA and mG tRNA modifications, linked tRNA modification biology to translation. Both indices correlated with proliferative, A/T-biased programs, providing a universal basis for their roles in cancer. Tissue proteomes showed strong RNA-protein discordance and distinct codon biases. Analysis of 21 cancer types revealed a global A/T-ending codon bias in cancer. Analysis of 2,600 cancer cell lines revealed codon bias heterogeneity in cell lines from the same cancer subtype that is not observable between cancer patients. ConclusionsOur results define synonymous codon divergence and tRNA-modification indices as determinants of translational reprogramming. This work establishes a unified framework connecting codon usage, tRNA modifications, and proteome remodeling, providing a basis for rational design of mRNA and gene therapeutics. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/685942v1_ufig1.gif" ALT="Figure 1"> View larger version (35K): org.highwire.dtl.DTLVardef@161ca0borg.highwire.dtl.DTLVardef@117c3f8org.highwire.dtl.DTLVardef@142d26borg.highwire.dtl.DTLVardef@483f6_HPS_FORMAT_FIGEXP M_FIG C_FIG

7

Integrating Quantitative Histology with Clinical Data Improves Prediction of Cervical Intraepithelial Neoplasia Regression

Lehtonen, O.; Nordlund, N.; Kahelin, E.; Bergqvist, L.; Aro, K.; Hautaniemi, S.; Kalliala, I.; Virtanen, A.

2026-01-22 obstetrics and gynecology 10.64898/2026.01.21.26344510 medRxiv

Top 0.1%

24.0%

Show abstract

Cervical intraepithelial neoplasia grade 2 (CIN2) lesions show variable outcomes, and accurate prediction of regression remains a major clinical challenge. We developed an interpretable machine learning pipeline that integrates quantitative histological, clinical, and human papillomavirus (HPV) -genotyping data to predict lesion regression within one and two years. Using panoptic segmentation of routine hematoxylin and eosin (H&E) -stained biopsies, we extracted human-interpretable morphological and immune cell infiltration related features that capture the key histopathological characteristics of CIN2 and identified features that predicted lesion regression. Further, integrating these features to predictive clinical features achieved higher predictive accuracy than clinical variables alone. These findings demonstrate that quantitative, interpretable analysis of H&E histology of routine diagnostic biopsies contains relevant information that predicts the natural history of CIN2 lesions. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=121 SRC="FIGDIR/small/26344510v1_ufig1.gif" ALT="Figure 1"> View larger version (38K): org.highwire.dtl.DTLVardef@e8ac93org.highwire.dtl.DTLVardef@199f7c6org.highwire.dtl.DTLVardef@159ee1dorg.highwire.dtl.DTLVardef@11fc720_HPS_FORMAT_FIGEXP M_FIG Created in BioRender. Lehtonen, O. (2026) https://BioRender.com/rlnkbkp C_FIG

8

Molecular basis of Salla Disease: R39C Mutation Effects on the Lysosomal Transporter Sialin

Matsingos, C.; Lot, I.; Vaz, M.; Mailliart, J.; Boulayat, M.; Debacker, C.; Goupil-Lamy, A.; Gasnier, B.; Acher, F. C.; Anne, C.

2026-04-22 biochemistry 10.64898/2026.04.20.719580 medRxiv

Top 0.1%

23.9%

Show abstract

Salla disease is caused by a genetic mutation in sialin, a lysosomal membrane transporter, which exports sialic acid from lysosomes. Substrate translocation occurs via a rocker-switch mechanism that alternately exposes the substrate-binding site to the lysosomal lumen and the cytosol. The pathogenic mutation R39C found in most Salla disease patients decreases the lysosomal localisation and the transport activity. In this study, we used computational and mutagenesis approaches to elucidate the molecular effects of the R39C mutation. Using three-dimensional models of human sialin in the lumen-open (LO) and cytosol-open (CO) states combined with the mutagenesis of selected residues, we identify a critical "triplet" motif comprising R39, E194, and E262, which is associated with an ionic lock formed between K197 and D350 in the LO conformation. Molecular dynamics simulations suggest that the electrostatic triplet negatively modulates the ionic lock, and are consistent with a strengthened ionic lock in R39C sialin, potentially favouring the LO state. To assess the global effects of the R39C mutation, we computed dynamic cross-correlation matrices and identified correlation patterns consistent with an allosteric coupling between the ionic lock K197/D350 and the region surrounding the sialic acid binding site in wild-type sialin, whereas in the LO state of R39C sialin, this communication preferentially bypasses this region. Therefore, the R39C mutation may impede the LO to CO conformational transition required for sialic acid transport, providing a plausible mechanistic framework for the decreased transport activity, and possibly the decreased lysosomal localisation, observed in Salla disease. HighlightsO_LIThe R39 residue participates in an interaction triplet, which negatively regulates an ionic lock stabilising the lumen-open conformation C_LIO_LIThe R39C mutation is associated with a stronger ionic lock in the simulations, and may favour the lumen-open state C_LIO_LICorrelation network analysis suggests an allosteric coupling between the ionic lock and the region surrounding the sialic acid binding site C_LIO_LIThe R39C mutation alters the inferred allosteric coupling between the ionic lock and the region surrounding the sialic acid binding site C_LI Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/719580v1_ufig1.gif" ALT="Figure 1"> View larger version (37K): org.highwire.dtl.DTLVardef@1bf7144org.highwire.dtl.DTLVardef@1a53ab8org.highwire.dtl.DTLVardef@b2249forg.highwire.dtl.DTLVardef@1827244_HPS_FORMAT_FIGEXP M_FIG C_FIG

9

Learning the histone codes of gene regulation with large genomic windows and three-dimensional chromatin interactions using transformer

Lee, D.; Yang, J.; Kim, S.

2021-12-30 genomics 10.1101/2021.12.30.472333 medRxiv

Top 0.1%

23.6%

Show abstract

The quantitative characterization of the transcriptional control by histone modifications (HMs) has been challenged by many computational studies, but still most of them exploit only partial aspects of intricate mechanisms involved in gene regulation, leaving a room for improvement. We present Chromoformer, a new transformer-based deep learning architecture that achieves the state-of-the-art performance in the quantitative deciphering of the histone codes of gene regulation. The core essence of Chromoformer architecture lies in the three variants of attention operation, each specialized to model individual hierarchy of three-dimensional (3D) transcriptional regulation including (1) histone codes at core promoters, (2) pairwise interaction between a core promoter and a distal cis-regulatory element mediated by 3D chromatin interactions, and (3) the collective effect of the pairwise cis-regulations. In-depth interpretation of the trained model behavior based on attention scores suggests that Chromoformer adaptively exploits the distant dependencies between HMs associated with transcription initiation and elongation. We also demonstrate that the quantitative kinetics of transcription factories and polycomb group bodies, in which the coordinated gene regulation occurs through spatial sequestration of genes with regulatory elements, can be captured by Chromoformer. Together, our study shows the great power of attention-based deep learning as a versatile modeling approach for the complex epigenetic landscape of gene regulation and highlights its potential as an effective toolkit that facilitates scientific discoveries in computational epigenetics.

10

Subtle changes at the RBD/hACE2 interface during SARS-CoV2 variant evolution: a molecular dynamics study

Gheeraert, A.; Leroux, V.; Mias-Lucquin, D.; Karami, Y.; Vuillon, L.; Chauvot de Bauchene, I.; Devignes, M.-D.; Rivalta, I.; Maigret, B.; Chaloin, L.

2024-12-13 bioinformatics 10.1101/2024.12.12.628120 medRxiv

Top 0.1%

23.3%

Show abstract

The SARS-CoV-2 Omicron variants present a different behavior compared to the previous variants, all particularly in respect to the Delta variant, as it seems to promote a lower morbidity although being much more contagious. In this perspective, we performed new molecular dynamics (MD) simulations of the various spike RBD/hACE2 complexes corresponding to the WT, Delta and Omicron variants (BA.1 up to BA.4/5) over 1.5 {micro}s timescale. Then, carrying out a comprehensive analysis of residue interactions within and between the two partners, allowed us to draw the profile of each variant by using complementary methods (PairInt, hydrophobic potential, contact PCA). Main results of PairInt calculations highlighted the most involved residues in electrostatic interactions that represent a strong contribution in the binding with highly stable contacts between spike RBD and hACE2 (importance of mutated residues at positions 417, 493 and 498). In addition to the swappable arginine residues (493/498), the apolar contacts made a substantial and complementary contribution in Omicron with the detection of two hydrophobic patches, one of which was correlated with energetic contribution calculations. This study brings new highlights on the global dynamics of spike RBD/hACE2 complexes resulting from the analysis of contact networks and cross-correlation matrices able to detect subtle changes at point mutations. The results of our study are also consistent with alternative approaches such as binding free energy calculations but are more informative and sensitive to transient or low-energy interactions. Nevertheless, the energetic contributions of residues at positions 501 and 505 were in good agreement with hydrophobic interactions measurements. The contact PCA networks could identify the intramolecular incidence of the S375F mutation occurring in all Omicron variants and likely conferring them an advantage in binding stability. Collectively, these data revealed the major differences observed between WT/Delta and Omicron variants at the RBD/hACE2 interface, which may explain the greater persistence of Omicron. Author SummaryThe evolution of SARS-CoV-2 was extremely rapid, leading to the global predominance of Omicron variants, despite the many mutations identified in the spike protein. Some of these were introduced to evade the immune system, but many others were located in the Receptor Binding Domain (RBD) without affecting its efficient binding to hACE2 and preserving the high infectivity of this variant. To unravel the mechanism by which this protein-protein connection remains strong or stable, it is necessary to study the different types of interactions at the atomic level and over time using molecular dynamics (MD) simulations. Indeed, in contrast to crystal or cryo-EM structures providing only a fixed image of the binding process, MD simulations have allowed to unambiguously identify the sustainability of some interactions mediated by key residues of spike RBD. This study could also highlight the interchangeable role of certain residues in compensating for a mutation, which in turn allows the virus to maintain durable binding to the host cell receptor. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=83 SRC="FIGDIR/small/628120v1_ufig1.gif" ALT="Figure 1"> View larger version (35K): org.highwire.dtl.DTLVardef@b2a6c4org.highwire.dtl.DTLVardef@e29044org.highwire.dtl.DTLVardef@6d9835org.highwire.dtl.DTLVardef@123c6f9_HPS_FORMAT_FIGEXP M_FIG Graphical abstract C_FIG

11

Hi-TrAC reveals fractal nesting of super-enhancers

Cao, Y.; Liu, S.; Cui, K.; Tang, Q.; Zhao, K.

2022-07-16 genomics 10.1101/2022.07.13.499926 medRxiv

Top 0.1%

23.3%

Show abstract

Eukaryotic genome spatial folding plays a key role in genome function. Decoding the principles and dynamics of 3D genome organization depends on improving technologies to achieve higher resolution. Chromatin domains have been suggested as regulatory micro-environments, whose identification is crucial to understand the genome architecture. We report here that our recently developed method, Hi-TrAC, which specializes in detecting chromatin loops among genomic accessible regulatory regions, allows us to examine active domains with limited sequencing depths at a high resolution. Hi-TrAC can detect active sub-TADs with a median size of 100kb, most of which harbor one or two cell specifically expressed genes and regulatory elements such as super-enhancers organized into nested interaction domains. These active sub-TADs are characterized by highly enriched signals of histone mark H3K4me1 and chromatin-binding proteins, including Cohesin complex. We show that knocking down core subunit of the Cohesin complex using shRNAs in human cells or decreasing the H3K4me1 modification by deleting the H3K4 methyltransferase Mll4 gene in mouse Th17 cells disrupted the sub-TADs structure. In summary, Hi-TrAC serves as a compatible and highly responsive approach to studying dynamic changes of active sub-TADs, allowing us more explicit insights into delicate genome structures and functions. Highlights- Hi-TrAC detects active sub-TADs with a median size of 100 kb. - Hi-TrAC reveals a block-to-block interaction pattern between super-enhancers, and fractal structures within super-enhancers. - Active sub-TADs are disrupted by the knockdown of RAD21. - Active sub-TADs interaction densities are decreased by the knockout of Mll4.

12

BAV-LLPS: A database of bacterial, archaea and virus liquid-liquid phase separation proteins

Rodriguez, C. B.; Tunque Cahui, R. R.; Demitroff, N.; Hirsh, L.; Devos, D. P.; Boccaccio, G.; Parisi, G.

2025-08-18 bioinformatics 10.1101/2025.08.15.670539 medRxiv

Top 0.1%

23.2%

Show abstract

Liquid-liquid phase separation (LLPS) is a key process underlying the formation of biomolecular condensates (BMCs), such as membrane-less organelles (MLOs), that compartmentalize biochemical processes inside the cells. While LLPS has been extensively studied in eukaryotes, its role in bacteria, archaea, and viruses remains far less characterized. Recent studies in bacteria have revealed that LLPS-driven condensates play critical roles in RNA processing, stress response, and pathogenicity. Similarly, many viruses exploit LLPS to facilitate crucial steps in their infection cycles, including viral entry, genome replication, assembly, and host immune evasion. In this work, we introduce a hand-curated database of LLPS proteins from bacteria, archaea, and viruses (BAV-LLPS Database). This resource, extended through sequence similarity searches, comprises over 5,000 proteins and integrates diverse data including biological annotations, sequence features, predicted disordered regions, LLPS per site probability, and AlphaFold2-based structural models. Additionally, our web server enables users to explore both the curated and homologous derived datasets, providing a platform to uncover evolutionary relationships and intrinsic and differential properties of LLPS proteins across various taxonomic groups. This work seeks to deepen our understanding of LLPS mechanisms beyond eukaryotic organisms, emphasizing their significance across diverse life forms. It also aims to foster the development of specialized predictive tools that will facilitate the exploration and characterization of LLPS processes in a wide array of living organisms, thereby contributing to advancements in both fundamental biological research and applied biomedical sciences. Availability and ImplementationBAV-LLPS DB is freely accessible at https://bav-llps-db.bioinformatica.org/. The data can be retrieved from the website. The source code of the database can be downloaded from https://bav-llps-db.bioinformatica.org/download Contactgusparisi@gmail.com and gboccaccio@leloir.org.ar

13

Human and bats genome robustness under COSMIC mutational signatures

Song, J.-H.; Zeng, Y.; Davalos, L. M.; MacCarthy, T.; Larijani, M.; Damaghi, M.

2024-09-07 genomics 10.1101/2024.09.05.611453 medRxiv

Top 0.1%

23.2%

Show abstract

Carcinogenesis is an evolutionary process, and mutations can fix the selected phenotypes in selective microenvironments. Both normal and neoplastic cells are robust to the mutational stressors in the microenvironment to the extent that secure their fitness. To test the robustness of genes under a range of mutagens, we developed a sequential mutation simulator, Sinabro, to simulate single base substitution under a given mutational process. Then, we developed a pipeline to measure the robustness of genes and cells under those mutagenesis processes. We discovered significant human genome robustness to the APOBEC mutational signature SBS2, which is associated with viral defense mechanisms and is implicated in cancer. Robustness evaluations across over 70,000 sequences against 41 signatures showed higher resilience under signatures predominantly causing C-to-T (G-to-A) mutations. Principal component analysis indicates the GC content at the codons wobble position significantly influences robustness, with increased resilience noted under transition mutations compared to transversions. Then, we tested our results in bats at extremes of the lifespan-to-mass relationship and found the long-lived bat is more robust to APOBEC than the short-lived one. By revealing robustness to APOBEC ranked highest in human (and bats with much more than number of APOBEC) genome, this work bolsters the key potential role of APOBECs in aging and cancer, as well as evolved countermeasures to this innate mutagenic process. It also provides the baseline of the human and bat genome robustness under mutational processes associated with aging and cancer. HighlightsO_LISinabro, the sequential mutation simulator, facilitates measuring the robustness of human protein-coding sequences under all COSMIC mutational signatures. C_LIO_LIRobustness under APOBEC mutational signatures showed the largest mean and standard deviation in the human genome. C_LIO_LIRobustness to mutational signatures analysis reveals the role of APOBECs is complementary to cancer in the evolvability of cancer cells in later stages. C_LIO_LIPrincipal component analysis indicates that the GC content at the codons wobble position significantly influences robustness. C_LIO_LIA long-lived bat (Myotis myotis) has higher robustness to APOBECs than a short-lived one (Molossus molossus) than humans. C_LI

14

AI-m6ARS: Machine learning-driven m6A RNA methylation site discovery with integrated sequence, conservation, and geographical descriptors

Uthayopas, K.; de Sa, A. G. C.; Ascher, D. B.

2024-06-18 bioinformatics 10.1101/2024.06.17.599439 medRxiv

Top 0.1%

23.1%

Show abstract

N6-Methyladenosine (m6A) is a predominant type of human RNA methylation, regulating diverse biochemical processes and being associated with the development of several diseases. Despite its significance, an extensive experimental examination across diverse cellular and transcriptome contexts is still lacking due to time and cost constraints. Computational models have been proposed to prioritise potential m6A methylation sites, although having limited predictive performance due to inadequate characterisation and modelling of m6A sites. This work presents AI-m6ARS, a novel model that utilises integrated sequence, conservation, and geographical descriptive features to predict human m6A methylation sites. The model was trained using the Light Gradient Boosting Machine (LightGBM) algorithm, which was coupled with comprehensive feature selection to improve the data quality. AI-m6RS demonstrates strong predictive capabilities, achieving an impressive area under the receiver operating characteristic curve of 0.87 on cross-validation. Consistent results on unseen transcripts in a blind test highlight the AI-m6ARS generalisability. AI-m6ARS also demonstrates comparable performance to state-of-the-art models, but offers two significant benefits: the model interpretability and the availability of a user-friendly web server. The AI-m6ARS web server offers valuable insights into the distribution of m6A sites within the human genome, thereby facilitating progress in medical applications. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=77 SRC="FIGDIR/small/599439v1_ufig1.gif" ALT="Figure 1"> View larger version (13K): org.highwire.dtl.DTLVardef@12d7502org.highwire.dtl.DTLVardef@15cf6b5org.highwire.dtl.DTLVardef@490699org.highwire.dtl.DTLVardef@5046c1_HPS_FORMAT_FIGEXP M_FIG C_FIG

15

Signal peptides as potential structural modulators of human splice isoforms

Tsaban, T.; Schueler-Furman, O.

2025-05-31 bioinformatics 10.1101/2025.05.27.656295 medRxiv

Top 0.1%

23.0%

Show abstract

Signal peptides (SPs) are short protein segments responsible for protein localization, typically cleaved from mature proteins and rapidly degraded after fulfilling their targeting function. Beyond localization however, these sequentially and structurally diverse elements may play additional roles. We explore how alternative splicing potentially creates structural contexts where SPs become integral components of folded domains. Employing AlphaFold and additional computational approaches, we examined a known functional immune protein splice isoform of human SLAMF6 that retains biological activity, despite forming a truncated domain that lacks the central elements of the canonical interaction interface. We revealed striking characteristic similarities between the SLAMF6 SP and the absent segment, indicating its ability to complement and stabilize the isoform domain. An in silico screen of 235,000 reported expressed human isoforms identified several dozen additional candidates with potential SP complementation, previously dismissed as modeling artifacts. Notably, immunoglobulin and carbonic anhydrase domains show particular enrichment among these candidates. SPs are commonly regarded as dispensable elements that can be altered or eliminated when investigating proteins and their structures. Our research proposes an alternative perspective wherein SPs might perform integral roles in stabilizing their source proteins, or others. These findings extend the growing body of evidence for moonlighting SPs, suggesting that we have only begun to uncover their true functional scope. Specifically, SPs emerge as unique modulatory elements essential for understanding the structural and functional behavior of protein splice isoforms.

16

Compositional restrictions in the flanking regions give potential specificity and strength boost to binding in short linear motifs

Acs, V.; Hatos, A.; Tantos, A.; Kalmar, L.

2024-05-14 bioinformatics 10.1101/2024.05.13.593809 medRxiv

Top 0.1%

23.0%

Show abstract

Short linear motif (SLiM)-mediated protein-protein interactions play important roles in several biological processes where transient binding is needed. They usually reside in intrinsically disordered regions (IDRs), which makes them accessible for interaction. Although information about the possible necessity of the flanking regions surrounding the motifs is increasingly available, it is still unclear if there are any generic amino acid attributes that need to be functionally preserved in these segments. Here, we describe the currently known ligand-binding SLiMs and their flanking regions with biologically relevant residue features and analyse them based on their simplified characteristics. Our bioinformatics analysis reveals several important properties in the widely diverse motif environment that presumably need to be preserved for proper motif function, but remained hidden so far. Our results will facilitate the understanding of the evolution of SLiMs, while also hold potential for expanding and increasing the precision of current motif prediction methods. Author summaryProtein-protein interactions between short linear motifs and their binding domains play key roles in several molecular processes. Mutations in these binding sites have been linked to severe diseases, therefore, the interest in the motif research field has been dramatically increasing. Based on the accumulated knowledge, it became evident that not only the short motif sequences themselves, but their surrounding flanking regions also play crucial roles in motif structure and function. Since most of the motifs tend to be located within highly variable disordered protein regions, searching for functionally important physico-chemical properties in their proximity could facilitate novel discoveries in this field. Here we show that the investigation of the motif flanking regions based on different amino acid attributes can provide further information on motif function. Based on our bioinformatics approach we have found so far hidden features that are generally present within certain motif categories, thus could be used as additional information in motif searching methods as well.

17

The importance of DNA sequence for nucleosome positioning in the process of transcriptional regulation

Sahrhage, M.; Paul, N. B.; Beissbarth, T.; Haubrock, M.

2023-08-03 bioinformatics 10.1101/2023.08.01.550795 medRxiv

Top 0.1%

22.7%

Show abstract

Nucleosome positioning is a key factor for transcriptional regulation. Nucleosomes regulate the dynamic accessibility of chromatin and interact with the transcription machinery at every stage. Influences to steer nucleosome positioning are diverse, and the according importance of the DNA sequence in contrast to active chromatin remodeling has been subject of long discussion. In this study, we evaluate the functional role of DNA sequence for all major elements along the process of transcription. We developed a random forest classifier based on local DNA structure that assesses the sequence-intrinsic support for nucleosome positioning. On this basis, we created a simple data resource that we applied genome-wide to the human genome. In our comprehensive analysis, we found a special role of DNA in mediating the competition of nucleosomes with cis-regulatory elements, in enabling steady transcription, for positioning of stable nucleosomes in exons and for repelling nucleosomes during transcription termination. In contrast, we relate these findings to concurrent processes that generate strongly positioned nucleosomes in vivo that are not mediated by sequence, such as energy-dependent remodeling of chromatin. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=150 SRC="FIGDIR/small/550795v3_ufig1.gif" ALT="Figure 1"> View larger version (45K): org.highwire.dtl.DTLVardef@46167corg.highwire.dtl.DTLVardef@16e6180org.highwire.dtl.DTLVardef@1c34758org.highwire.dtl.DTLVardef@180ea7d_HPS_FORMAT_FIGEXP M_FIG C_FIG

18

Disrupted chromatin architecture in olfactory sensory neurons: A missing link from COVID-19 infection to anosmia

Tan, Z. W.; Toon, P. J.; Guarnera, E.; Berezovsky, I. N.

2022-08-19 bioinformatics 10.1101/2022.08.19.504545 medRxiv

Top 0.1%

22.6%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWWe tackle here genomic mechanisms of a rapid onset and recovery from anosmia - a useful diagnostic indicator for early-stage COVID-19 infection. On the basis of earlier observed specifics of olfactory receptors (ORs) regulation in the mice chromatin structures, we hypothesized that the disruption of OR function can be caused by chromatin reorganization taking place upon SARS-CoV-2 infection. We reconstructed the chromatin ensembles of ORs obtained from COVID-19 patients and control samples using our original computational framework for the whole-genome chromatin ensemble 3D reconstruction. We have also developed here a new procedure for the analysis of fine structural hierarchy in local, megabase scale, parts of chromosomes containing the OR genes and corresponding epigenetic factors. We observed structural modifications in COVID-19 patients on different levels of chromatin organization, from alteration of the whole genome structure and chromosomal intermingling to reorganization of contacts between the chromatin loops at the level of topologically associating domains. While complementary data on known regulatory elements point to pathology-associated changes within the overall picture of chromatin alterations, further investigation using additional epigenetic factors mapped on 3D reconstructions with improved resolution will be required for better understanding of anosmia caused by SARS-CoV-2 infection.

19

Effects of Extruder Dynamics and Noise on Simulated Chromatin Contact Probability Curves

Konstantinov, V.; Artem, S.; Lagunov, T.

2025-06-12 genomics 10.1101/2025.06.09.658566 medRxiv

Top 0.1%

22.5%

Show abstract

Loop extrusion by SMC complexes is a key mechanism underlying chromatin folding during both interphase and mitosis. Despite this shared mechanism, computational models of loop extrusion often rely on fundamentally different assumptions: interphase models typically use dynamic extruders with finite lifetimes, whereas mitotic models employ static extruders placed according to loop size distributions. In this work, we investigate whether these modeling paradigms are interchangeable or yield intrinsically incompatible results. Using publicly available Hi-C data from mitotic chicken cells, we systematically compare dynamic and static loop extrusion models implemented in the Polychrom framework. We evaluate how key parameters such as the extruder lifetime, extrusion velocity, and spatial noise affect the simulated contact probability curves P(s) and loop size distributions. Our results reveal that while both model types can be tuned to approximate the general shape of P(s), they produce distinct internal structures and divergent relationships between loop size and contact decay. We also show that increased extruder lifetimes lead to excessive nested loop formation, which alters both loop statistics and P(s) derivatives. Introducing spatial exclusion constraints between extruders partially restores consistency with static models. These findings highlight that differences in extruder behavior and polymer noise levels can significantly impact chromatin model outcomes and must be carefully accounted for when interpreting or comparing simulation results across biological conditions. Author summaryChromatin organization plays a crucial role in gene regulation and cellular function, yet our understanding of its three-dimensional structure relies heavily on computational modeling and the interpretation of complex experimental data. In this study, we use coarse-grained modeling approaches to simulate chromatin folding and systematically investigate how different analysis metrics and data processing methods influence the conclusions drawn from such models. By comparing widely used metrics and exploring the effects of normalization and noise, we highlight potential pitfalls and biases that can arise in chromatin modeling studies. Our findings provide practical recommendations for researchers in the field, aiming to improve the robustness and reproducibility of computational analyses of chromatin architecture. This work will help guide future studies toward more reliable interpretations of chromatin structure and its biological implications.

20

Intrinsically disordered protein mutations can drive cancer and their targeted interference extends therapeutic options

Meszaros, B.; Hajdu-Soltesz, B.; Zeke, A.; Dosztanyi, Z.

2020-04-30 bioinformatics 10.1101/2020.04.29.069245 medRxiv

Top 0.1%

22.3%

Show abstract

Many proteins contain intrinsically disordered regions (IDRs) which carry out important functions without relying on a single well-defined conformation. IDRs are increasingly recognized as critical elements of regulatory networks and have been also associated with cancer. However, it is unknown whether mutations targeting IDRs represent a distinct class of driver events associated with specific molecular and system-level properties, cancer types and treatment options. Here, we used an integrative computational approach to explore the direct role of intrinsically disordered proteins/protein regions (IDPs/IDRs) driving cancer. We showed that around 20% of cancer drivers are primarily targeted through a disordered region. The detailed analysis of these IDRs revealed that they can function in multiple ways that are distinct from the functional mechanisms of ordered drivers. Disordered drivers play a central role in context-dependent interaction networks and are enriched in specific biological processes such as transcription, gene expression regulation and protein degradation. Furthermore, their modulation represents an alternative mechanism for the emergence of all known cancer hallmarks independently of the modulation of globular proteins. Disordered drivers are also highly relevant at the sample level, and their mutations can represent the key driving event in certain individual cancer patients. However, treatment options for such patients are currently severely limited. The presented study highlights a largely overlooked class of cancer drivers associated with specific cancer types that need novel therapeutic options.